A Topical Crawler for Uncovering Hidden Communities of Extremist Micro-Bloggers on Tumblr

نویسندگان

  • Swati Agarwal
  • Ashish Sureka
چکیده

Research shows that microblogging websites such as Tumblr are being misused as a platform to disseminate hate and extremism. We formulate the problem of locating such extremist communities as a graph search problem. We propose a topical crawler based approach performing several tasks: searching for a blogger, computing its similarity against exemplary documents, filtering hate promoting bloggers, navigating through links to other bloggers and managing a queue of such bloggers for social network analysis. We conduct experiments on real world dataset and examine the e↵ectiveness of ’like’ and ’reblog’ features as links between bloggers. Experimental results demonstrates that the proposed solution approach is e↵ective with an F-score of 0.80.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spider and the Flies : Focused Crawling on Tumblr to Detect Hate Promoting Communities

Tumblr is one of the largest and most popular microblogging website on the Internet. Studies shows that due to high reachability among viewers, low publication barriers and social networking connectivity, microblogging websites are being misused as a platform to post hateful speech and recruiting new members by existing extremist groups. Manual identification of such posts and communities is ov...

متن کامل

Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats

Research shows that various social media platforms on Internet such as Twitter, Tumblr (micro-blogging websites), Facebook (a popular social networking website), YouTube (largest video sharing and hosting website), Blogs and discussion forums are being misused by extremist groups for spreading their beliefs and ideologies, promoting radicalization, recruiting members and creating online virtual...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

A multi-region empirical study on the internet presence of global extremist organizations

Extremist organizations are heavily utilizing Internet technologies to increase their abilities to influence the world. Studying those global extremist organizations’ Internet presence would allow us to better understand extremist organizations’ technical sophistication and their propaganda plans. In this work, we explore an integrated approach for collecting and analyzing extremist Internet pr...

متن کامل

Rules Design in Word Segmentation of Chinese Micro-Blog

This paper proposed a Hidden Markov Model (HMM) based tokenizer for Chinese micro-blog texts. Comparing with normal Chinese texts, micro-blog texts contain more uncertainties. These uncertainties are generally aroused by the irregular use of bloggers (such as network words, dialect words, wrong written characters, mixture of foreign words and symbols, etc.). Besides the lack of the annotated tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015